Scaling a 1500+ model beast: How ClickUp utilizes dbt and Snowflake for cost-effective storage and computation from Coalesce 2023

Michael Revelo, Data Platform Lead at ClickUp, explains how his team optimizes its data pipeline using dbt.

"All of the strategies that we've talked about... once we started implementing them, we actually cut our cost by 50%."

Michael Revelo, Data Platform Lead at ClickUp, explains how ClickUp, a productivity platform, optimizes its data pipeline using dbt. He discusses various strategies used to scale a monolithic repo and make it computationally slim and efficient. He also explains how his team uses dbt to save on their Snowflake credits.

Scaling a monolithic repo and optimizing computation with dbt

Michael explains the methods ClickUp’s team uses to scale a monolithic repo and improve computational efficiency. His team focused on creating a lean operation, optimizing Snowflake credits, and maintaining a consistent and efficient data pipeline. Through a careful combination of various strategies, they managed to cut their costs by half.

Michael states, "We started implementing strategies like Slim CI and deferrals, freshness checks, and incrementals strategically." He emphasizes that their strategies were not implemented across the board, but rather strategically applied to tackle specific issues.

Detailing their dbt job setup, Michael highlights that they used “deferral by job,” but have since switched to “deferral by environment.” He also explains that they opted to use the "run freshness" checkbox in dbt rather than a run operation. This way, even if some of their sources were stale, their build would still succeed, and all their dependent downstream resources would be triggered.

Harnessing the power of GitHub and dbt functionality

To better manage their data, the ClickUp team utilized GitHub and various dbt functionalities. They've set up their GitHub to restart from failure and employed a dbt CI package to re-run a job if it fails. They also implemented a "bypass" macro, which allowed them to manually change the schema and always point it at production.

Michael discusses the team's approach to GitHub: "Early on, we kind of went against the grain, and we had a shared beta environment for all of our team members... but as we scaled, we started running into collisions with that approach, so we had to quickly change it."

He also explains how they used dbt functionality: "We moved all of our dbt-utils checks into freshness, mostly because it gave us a nice little UI to look at.” These strategies helped them keep their pipeline slim.

Utilizing metadata and observability tools

"We're always being mindful of saving on our Snowflake credit."

To gain better insights into their data, the ClickUp data team used metadata and observability tools. They used Hex to collect and visualize metadata and used Snowflake monitoring and Elementary Data to enhance their observability. The team also created an incremental table of Snowflake query history to maintain a record of all queries.

"We wanted to make sure we maintain the query history of all time, so we could do a two-year look back for whatever reason."

By leveraging metadata and observability tools, they were able to join metadata to queries, compute dbt and Snowflake costs, and monitor resource and model level costs. This allowed them to maintain visibility across both the micro and macro levels of their data pipeline.

Michael’s key insights

ClickUp uses a hybrid approach in their data team, having a core, centralized team and also embedded analysts
ClickUp employs dbt to perform tasks such as standardization of data, removal of deleted and malformed records, and creation of clean models for downstream use
ClickUp's dbt job setup includes development, beta, and production environments. They also use Slim CI, cloning, and other strategies to optimize their data pipeline
ClickUp uses GitHub for tasks such as restarting from failure, managing PRs, and handling schema per PR
ClickUp collects metadata from dbt and Snowflake to provide observability across both the micro and macro level
ClickUp has managed to cut its Snowflake costs by 50% by implementing these strategies

Use analytics engineering and the Snowflake Data Cloud with dbt Cloud from Coalesce 2023

Team members from Deloitte and Snowflake offer advice for organizations looking to improve their analytics capabilities.

How metadata reduces data sprawl from Coalesce 2023

Etai Mizrahi, Co-Founder & CEO of Secoda, discusses an important new concept called metadata monitoring.